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Many protein systems fold in a two-state manner. Random models, however, rarely display two- 
state kinetics and thus such behavior should not be accepted as a default. To date, many theories for 
the prevalence of two- state kinetics have been presented, but none sufficiently explain the breadth 
of experimental observations. A model, making a minimum of assumptions, is introduced that 
suggests two-state behavior is likely for any system with an overwhelmingly populated native state. 
We show two-state folding is emergent and strengthened by increasing the occupancy population 
of the native state. Further, the model exhibits a hub-like behavior, with slow interconversions 
between unfolded states. Despite this, the unfolded state equilibrates quickly relative to the folding 
time. This apparent paradox is readily understood through this model. Finally, our results compare 
favorable with experimental measurements of protein folding rates as a function of chain length and 
Keq, and provide new insight into these results. 



INTRODUCTION 

Most small (< 100 residues), single domain proteins 
fold in a two-state manner [THi] . Specifically, protein sys- 
tems appear to be thermodynamically two-state - they 
have only two equilibrium phases (folded and unfolded) - 
and also kinetically two-state, exhibiting single exponen- 
tial kinetics. Protein domains that break this two-state 
paradigm are usually either large (> 100 residues) and 
folding via one or more intermediates [5 , or small and 
extremely rapidly folding ^6Hl7j. 

Simple two-state folding kinetics should be considered 
surprising. Protein chains have a large number of in- 
dependent conformations available to them, and fold- 
ing occurs via a stochastic interconversion between these 
conformations, often described as dynamics evolving on 
a "rough" potential energy function. These dynamics 
might be thought of as a network, where the nodes are 
conformations and connections are the rates of intercon- 
version. It is then interesting to ask what the dynamics 
on a random network looks like [18, 19 , and how they 
compare to protein folding. In such random dynamical 
systems, two-state behavior is exceptional and rarely seen 
[20] . Later in the paper, we will show why. The fact that 
two-state behavior is rare in random systems suggests 
that two-state kinetics cannot be accepted as a default. 

Typically two-state kinetics in protein folding is ratio- 
nalized in terms of a Kramer's rate expression, which 
postulates the existence of a single dominant free en- 
ergy barrier between folded and unfolded thermodynamic 
states. Postulating such a barrier - in effect, enforcing 
two-state kinetics - implies two thermodynamic phases. 
Here, we investigate the converse; we take two-state ther- 



modynamics as a postulate and show, without ever in- 
voking Kramer's theory or an activated process of any 
kind, that we can expect such systems to typically be 
two-state when certain conditions are satisfied [2T] . 

One oft-cited explanation is that two-state folding min- 
imizes aggregation, and is therefore evolutionarily advan- 
tageous. This argument suggests that folding intermedi- 
ates are more prone to aggregation than the unfolded or 
folded state, and therefore biology has attempted to min- 
imize their population during folding [131 1221 123 • While 
intriguing and certainly possible, there is little direct ev- 
idence currently supporting this claim. 

An alternative reason for the predominance of two- 
state folding suggests that most sequences capable of 
folding are intrinsically two-state. That is, the space of 
sequences that overwhelmingly populate a native state 
is enriched in systems that are thermodynamically and 
kinetically two- or few- state. 

We call these competing theories the aggregation hy- 
pothesis, which dictates that two-state behavior is bio- 
logically necessary to avoid aggregation, and the ther- 
modynamic hypothesis, dictating two-state folding is in- 
timately linked to folding sequences whether those se- 
quences were formed in the lab or in vivo [25]. Conclu- 
sive proof of either of these, or rejection of both in fa- 
vor of an alternative, would have a major impact on the 
folding field. If two-state folding is necessary to avoid 
aggregation, this certainly has implications for under- 
standing folding in vivo and developing therapeutics for 
Alzheimer's and other aggregation-related diseases. If 
two-state folding is a physical necessity, then understand- 
ing why is a key element in a complete understanding of 
the protein folding problem. 
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FIG. 1. Illustration of three different kinetic spectra associ- 
ated with three protein systems. Each blue horizontal line 
represents one system timescale Q. The theory developed 
here predicts that as the folded state become more populated, 
the gap between the slowest and next-slowest timescales will 
grow. As this separation grows, so will the likelihood that a 
given experiment will see one dominant relaxation timescale, 
and classify the system as "two-state" . These spectra are ide- 
alized models that ignore the more complex situations of sys- 
tems with more than two states. Note that these are timescale 
spectra, representing kinetic processes, and therefore are dif- 
ferent from some classic work in folding on thermodynamic 
energy spectra {e.g. [24]). 
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Here, we present an argument in favor of the thermo- 
dynamic hypothesis. We show that for a simple model of 
protein folding, if there is an overwhelmingly populated 
native state, two-state thermodynamics and single expo- 
nential kinetics are extremely likely. In this model, two- 
state kinetics are emergent, rather than built-in. This 
allows us to directly analyze the necessary and suffi- 
cient conditions for two-state behavior. Interestingly, we 
find that multi-exponential kinetics can be explained as 
highly perturbed or relatively unstable two-state systems 
(Fig.[l]). 

Moreover, the model provides an explanation how two- 
state systems might produce a "kinetic hub" [26H28] , a 
dynamical system where most transitions between any 
two states pass through a third "hub" state [29]. Specif- 
ically, we introduce the concept of slow interconversions 
with rapid equilibration, which shows that a situation 
where the unfolded state equilibrates rapidly does not ne- 
cessitate a situation where specific structures can reach 
each other quickly. This seeming paradox is resolved, 
allowing us to reconcile hub-like kinetics with two-state 
behavior. The presented model reproduces the bimodal 
mean first-passage time distribution from all-atom molec- 
ular dynamics simulations that originally inspired the 
hub hypothesis [30], and still retains two-state folding. 

Finally, and crucially, our model provides an explana- 
tion for why large proteins (/,100 residues) typically ex- 



FIG. 2. Small proteins fold in a two-state manner, while 
larger proteins exhibit additional timescales. Shown are all 
proteins in the KineticDB 31 , which are classified as two- 
state (blue) or multi-state (green). Top panel shows folding 
times as a function of size (units of /cobs are s~^), bottom panel 
shows a histogram of kinetic type as a function of number of 
residues. At ^ 100 residues there is a transition between two- 
state and mult i- state behavior. Insert shows two timescale 
spectra for the model discussed in the main text {N = 100 
states, e = 0.01, left and e = 0.05, right) showing both two- 
and higher-state kinetics can be observed in the model. The 
model predicts more timescales will be observed experimen- 
tally as proteins get larger and the perturbation parameter e 
increases. 



hibit three or more-state folding kinetics (Fig. [2j Fig. [4| . 
The model predicts that as protein systems get larger 
and larger, additional timescales will be experimentally 
observable. 

In the construction of the model, we postulate only 
that (A) protein dynamics can be represented by a mas- 
ter equation, (B) the system satisfies detailed balance, 
and (C) there exists one folded state that is highly pop- 
ulated compared to all other conformations. From these 
assumptions we build a model that quantitatively con- 
tains no additional information, or, equivalently, is the 
most random, using the maximum entropy formalism. 
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The maximum entropy method is a natural choice for 
studying folding, because in some sense it reflects the 
process by which foldable sequences occur. One could 
conceive of evolution as a random search for a specific 
target - the target being a functional structure, and the 
search occurring through a huge number of possible se- 
quence mutations. The one requirement for this search 
would be that the functional structure is overwhelmingly 
populated at equilibrium. Our model shows that this 
thermodynamic requirement alone is sufficient to explain 
two-state behavior. 

THEORY OF DISCRETE TIME MASTER 
EQUATIONS 

Our model is framed in the language of Markovian 
master equations, which give us a framework for imple- 
menting the maximum entropy model mentioned. We 
consider a discrete time propagator T/\t that describes 
the system dynamics. Let p{t) be a function describing 
the population density of the system at time t - we are 
interested in the time evolution of this function. Mo- 
mentarily we will postulate a partitioning of phase space 
into N discrete states. In this case, p{t) is a vector, such 
that Pi{t) is the population of a discrete state i at time 
t. Further, T/\t is a stochastic matrix, whose elements 
Tij describe the probability for the probability density in 
state i to transfer to state j in the lag time At, 

p{n-At)=poT^t 

from initial populations po. Note that in what follows 
the specific lag time used will not be too important, so 
we will drop the At subscript and just write T. 

The system dynamics can then be understood via the 
eigenmodes of the propagator. The system timescales 
are given by the eigenvalues of T, 

At 
log(An) 

while the corresponding eigenvectors describe the ex- 
change of population between states on those timescales. 
The first eigenvalue is always unity (Ai = 1) correspond- 
ing to infinite time. Its corresponding eigenvector is the 
stationary distribution, denoted tt. 

Each row of T is a probability distribution, and there- 
fore must be row-normalized and admits a measure of 
entropy, loosely speaking the information content of the 
probability distribution for each row. Further, the entire 
propagator has an associated entropy (sometimes called 
the "caliber" ) ^3 ^ . Assuming that the distributions 
described by each row are independent (which is identical 
to the Markov assumption used to formulate the master 
equation), we can write the propagator entropy as 

St = -J^TijlogTij (2) 



Maximizing this function, subject to some restraints de- 
scribing known information, gives the model that makes 
the fewest assumptions about the system dynamics. 

THE MAXIMUM ENTROPY FORMALISM 
ALLOWS THERMODYNAMIC POSTULATES TO 
RESULT IN KINETIC MODELS 

Using the maximum entropy principle, let us build a 
model of protein folding. Our goal will be to use well- 
known facts about proteins as starting assumptions, but 
limit these as much as possible. Let us postulate: 

A) Protein dynamics can be described as transitions be- 
tween N distinct states that partition phase space, 
where each state is approximately a single conforma- 
tion (we will call these "microstates" states, not to 
be confused with the thermodynamic folded/unfolded 
macrostates). We assume dynamics on this space are 
described by a Markovian propagator T. Finally, we 
expect the number of possible states to be very large 
(famously estimated by Levinthal to be ~ 3^^^), such 
that we will not be too hesitant in assuming N is big. 

B) The system is ergodic and time-reversible, and there- 
fore the detailed balance condition holds 

TTiTij = TTjTji (3) 

where tt describes the stationary solution to T that 
is approached asymptotically in time. 

C) Proteins have evolved in such a way that there exists 
a folded state F that has a much larger equilibrium 
population than all other states, i.e. ttf ^ TTi for all 
i that are not F. 

Solving for the transition matrix T that maximizes ([2| 
subject to postulates (A, B, C) is a straightforward exer- 
cise in Lagrange multipliers. In what follows, we investi- 
gate analytical solutions for the specific case where there 
is one highly populated native state, and all unfolded 
states are equally populated at equilibrium. Mathemat- 
ical detail has been relegated to the supplemental infor- 
mation; we present the results. 

Our key result is a timescale spectrum (the eigenspec- 
trum of T), which is simply a collection of the system 
timescales. For instance, in a re-folding experiment a 
series of exponential decays might be observed in e.g. a 
trace of Trp fiuorescence, 

A{t) = ^A,e-'/^' (4) 

i 

where A{t) is the observable trace, are the ob- 
served timescales. Finally, the amplitudes Ai = 
(Po, V^f V^/^), with Po the initial populations of each 
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state when the experiment begins, O a vector of the av- 
erage observable value for each state, and , i/jf- are the 
i^^ left and right eigenvectors of the propagator, respec- 
tively. This amplitude is the mathematical expression of 
familiar physical concepts. First, the initial populations 
of each state will affect what experimental response is ob- 
served; the factor (po, V^f ) represents the extent to which 
a prepared sample (with populations po) will participate 
in the mode with timescale r^. Second, different experi- 
mental probes will report more strongly on certain states 
than others; the factor (O, ipf-) captures this uneven re- 
porting. 

In such an experiment, the "kinetic spectrum" is just 
the collection of (Fig.[T]). There could be one observed 
exponential, two, or many, depending on the system and 
experiment. Which timescales are observed will depend 
on two factors: to be seen, the amplitude Ai must be 
large enough for a given experiment with limited sensi- 
tivity to observe it. Second must be in the appro- 
priate range of the experiment's temporal resolution. In 
this model we compare to experiment by computing the 
kinetic spectrum, {r^}, which is invariant over different 
experimental probes and initial conditions; we do not 
compute the amplitudes {A^}, which depend strongly on 
the experiment under consideration. 



A SINGLE LOW ENERGY STATE RESULTS IN A 
KINETICALLY AND THERMODYNAMICALLY 
TWO-STATE SYSTEM 

We take ttf ^ and tt^ = tt^, where i and j are 
unfolded states. Let's label all such states as /7, and 
label the respective populations as ttf and ttu {ttu is the 
population of just one of N — 1 unfolded states). With 
this, it can be shown that T takes the form 



fTuu 



Tuu 
Tuu 



Tuf\ 

TuF 



\ Tfu Tfu ' • • Tff ) 

where Tuu is the probability of transition from a single 
unfolded state to any other, Tuf is the probability of 
going from an unfolded to a folded state, and Tfu is the 
probability of the converse. Further, if ttf > TTf/, then 
Tfu < Tuu < Tuf] the differences between these get 
larger as the difference in population between F and U 
states increases. 

This matrix is the foundation of our model. In the 
next section, we analyze models very near this maximum 
entropy solution, but where the symmetry in the transi- 
tion probabilities is broken by a small perturbation; we 
begin by analyzing the current result, as it represents the 
simplest case. 

We find three eigenvalues: one stationary (Ai), one 
corresponding to the folding/unfolding reaction (A2), and 



one corresponding to unfolded state dynamics (A3, with 
multiplicity — 2), 

Ai = 1 

X2 = l-TuF-{N- 1)Tfu 
X3 = l-TuF-{N- l)Tuu 

showing that since Tfu < Tuu^ Ai > A2 > A3. Further, 
for 7Tf ^ TTf/, there will be a significant gap between A2 
and A3, leading to a separation of timescales consistent 
with a two-state picture, even though the model consists 
of TV > 2 states. 

The final step of our model involves solving an equation 
numerically, therefore we cannot write down a closed- 
form expression for the A2/A3 gap; it is possible, how- 
ever, to plot this timescale gap as a function of K^q or 
the number of configuration states N. Figure |3] shows the 
scaling of the folding timescale r2 = — At/log(A2) with 
Keq^ and compares this scaling with two experimental 
systems, lambda repressor [11 and BBL [34J. Further, 
A3 is not a function of K^q in this model. Thus, the right- 
most panel of Fig. [3] demonstrates the theoretical scaling 
of the gap between the slowest (folding) timescale and 
the next-slowest system timescale. A precise treatment 
of this second-slowest timescale is performed in the next 
section. 

It is critical to note that the discussion of modulating 
Keq is restricted to changing the conditions for a single 
protein sequence. Our model contains no features that 
allow it to distinguish between sequences. Thus, we use 
the term "stability" to mean the relative population of 
the native state between under two different conditions, 
and not relative stability between mutants or different 
proteins (that might be compared via a AG of folding or 
melting temperature T^). 

Why is a gap between A2 and A3 a signature of two- 
state kinetics? In a system where the folding timescale 
is much slower than the rest, an experiment designed to 
study folding may be poorly suited (too low-resolution) 
to measure faster kinetics. Further, probes designed 
specifically to study folding may have a large ampli- 
tude response to folding kinetics, but not to other kinetic 
modes in the system. Mathematically, if (O, V^^^) is max- 
imized {iIj2 is the eigenvector describing folding), then 
((9, ip^) n > 3 will be small, since the eigenvectors ipf- 
are orthogonal. Therefore, experiments well-designed to 
study folding will measure other system modes at a much 
lower amplitude. 

Recently, as higher resolution instrumentation has 
been developed [36], faster timescales such as these have 
been found in folding systems that were previously con- 
sidered two-state. Perhaps the best single example of this 
is the villin headpiece. In [37 , the kinetics of villin were 
measured by NMR lineshape analysis and fit well to a 
two-state model with a folding time of order jas. Later, 
laser-induced temperature jump and ultra-fast triplet- 
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FIG. 3. A comparison of the theory and experiment: the change in observed folding timescale as a function of Keq^ modulated 
experimentally by temperature. Left: a series of lambda repressors from [11]. Middle: BBL, from [34]. Right: this work (values 
of Keq < 1 not plotted) . Since T3 is not a function of Keq , this plot also demonstrates the gap between T2 and T3 as a function 
of Keq. Insert: theoretical dependence of folding times on chain length is exponential assuming R ^ e^^ , which is consistent 
with experiment (Fig. |2]), though is not the only consistent model [35] . 



triplet energy transfer experiments reveled additional dy- 
namical processes, including intermediate formation at 
70 ns [38] and native state locking/unlocking at 170 ns 
[39] . Thus, the villin timescale spectrum has one rela- 
tively slow mode, its folding time, and at least two faster 
modes. This is consistent with our model, which predicts 
proteins have one slow mode (A2) and a number of faster 
modes (A^, 3 < n < N). 

In systems where the separation between slow and fast 
modes is very large - larger than the separation for villin, 
for instance - it is likely that an experiment designed 
to measure the slow folding timescale will not measure 
faster timescales. Thus, only one kinetic timescale is ever 
seen experimentally, and we call such systems kinetically 
"two-state" . 

We have observed the model is kinetically two-state, 
but it is also (by construction, due to the choice of 
tt) thermodynamically two-state, as demonstrated by a 
sharply peaked Cy curve. Denote the free energy of each 
state by Au for unfolded and Af for the folded states. 
Then set our scale of energy such that Au = 0. With 
this, we can write the ratio of state populations 



now, scale our units of temperature such that the folding 
temperature, where ttf = 0.5, occurs at /3 = 1. Then it 
is clear from the above that Ap = — log(A^ ~ !)• The 
partition function is 

z-\^.-/5^^ = (7V-1) + (7V-1)^ 



i 

resulting in the heat capacity 

which exhibits a first order phase transition at /3 
consistent with what is observed in experiment (Fig. [4|. 



log(iV - 1) 
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FIG. 4. Left: the calculated Cy vs /S curve (for N = 100), 
showing a phase transition at the melting temperature /S — 
1. Right: experimental dependence of T3 with protein chain 
length R. Data from 12 multi-state proteins reported in the 
KineticDB i3l]. 



PERTURBATION OF THE MODEL SHOWS 
TWO-STATE FOLDING IS ROBUST 

While this simple model demonstrates a minimal set 
of sufficient requirements for two-state folding, it re- 
tains an artificial symmetry - all the rates in each set 
{Tf/f/}, {Tf/i?}, {Tpu} are identical. Such symmetry can 
be broken by adding random "noise" to the transition 
matrix elements. Robust two-state folding should not be 
affected by such a perturbation - experimentally, a single 
mutation or slight change in experimental conditions is 
insufficient to disrupt two-state behavior in the majority 
of systems. 

A reasonable perturbation is the addition of a random 
Gaussian to each element of T, in the form of a matrix 
T' whose elements are derived from Gaussians 

f = T + eT' 

where primes to denote perturbing terms, tildes the re- 
sulting perturbed solution, and e is a control parameter 
denoting the size of the perturbation |40J . In the supple- 
mental information, we show that one can construct such 
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FIG. 5. The model's eigenspectra ({A^}) under no pertur- 
bation (left), a small perturbation (middle left), and a large 
perturbation (middle right). These plots are for a single in- 
stance of the distribution indicated in the main paper, with e 
set as indicated. The far right is the spectrum of one sample 
from the GOE (red line is origin). In the absence of any native 
bias {tvf ~ tvu), the kinetic spectrum would be qualitatively 
similar to just the GOE (with all of the eigenvalues positive). 
Parameters were A/" = 100 and Keq = 1.0. 

a perturbation while ensuring T maintains detailed bal- 
ance and is a stochastic matrix, in the thermodynamic 
limit {N oo). The perturbation derives from a sym- 
metric matrix whose elements are drawn from a Gaussian 
distribution - such a matrix is known as a member of the 
Gaussian orthogonal ensemble (GOE) [41 . 

From this, we find the probability density of obtaining 
a spectrum P{Xn) under a random perturbation is 

N N N 

n=3 3>i j>i 

where we have considered only the eigenvalues represent- 
ing dynamics in the unfolded state (A3). Here C is simply 
a constant that normalizes the distribution. 

This perturbation has the effect of splitting the de- 
generacy and spreading out the previously overlapping 
eigenvalues (Fig. [5|. One can see this by noticing that 
the terms of form | A^ — Xj \ require the probability density 
go to zero as two eigenvalues get close together. This ef- 
fect is known as level repulsion in random matrix theory. 

What consequences does this perturbation have for 
protein folding? One can see that for small perturba- 
tions, the symmetry of the original degenerate model 
is broken, but a spectral gap between the unfolded 
timescale and the folding timescale (A2) still exists (Fig. 



|5| [42]. Once perturbations get very large, however, one 
expects that the timescales of unfolded state dynamics 
will spread sufficiently to be comparable to the folding 
timescale. This will destroy the two-state features of 
the model, and shows that, as stated in the introduc- 
tion, highly random models will not exhibit two-state 
behavior. We conclude that while two-state folding for 
this model is relatively robust, under server perturbations 
multi-exponential, non-two state kinetics may arise. 

THREE OR GREATER STATE BEHAVIOR MAY 
BE OBSERVED, ESPECIALLY IN LARGE 
SYSTEMS. 

Random matrix theory also provides an estimate for 
the relative timescale of the slowest non-folding process. 



as a function of the size of the random perturbation e 
and chain length i?, assuming the exponential scaling 
TV ~ e"^ [43 . While the expression for rs contains too 
many unknown parameters (e, a. At) to make meaningful 
quantitative predictions of experiment, it does suggest 
that the slowest non-folding timescales in proteins should 
increase with chain length. This is consistent with what 
is seen experimentally for the 12 multi-state proteins for 
which the slowest non-folding timescale is reported in the 
literature (Fig. |4| \3T - additional data will be necessary 
to definitively confirm this prediction. 

THE MODEL EXHIBITS NATIVE HUB-LIKE 
BEHAVIOR 

The mean first passage time (MFPT) is the expected 
time it takes for a walker starting at state i of a Markov 
chain to reach state j for the first time. It is apparent 
from inspection that, due to the fact that Tuu < Tuf^ 
we expect the MFPT from an unfolded state to any other 
unfolded state to be slower than the passage time from 
that state to the native state. A plot of the distributions 
of MFPTs from every state to every other is therefore 
bimodal (Fig. |6|, a property that has been described as 
"hub-like" [30 after it was witnessed in all-atom molecu- 
lar dynamics simulations [26l|44l|45] (recently more sensi- 
tive measures of hub-like phenomenology have been pro- 
posed and employed a model very similar to this one 
^11^). Numerical simulations show a small perturba- 
tion spreads out such a distribution, but does not destroy 
the bimodality (Fig. [6| . The system only exhibits these 
hub-like behaviors when there is a single native state. 

Slow MFPTs between unfolded states might seem in- 
consistent with the fact that A3, the eigenvalues cor- 
responding to the unfolded state dynamics, represent 
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FIG. 6. MFPT distribution (normalized) for an unperturbed 
(black, ^-functions) , slightly perturbed (blue, e = 10~^) and 
significantly perturbed (red, e = 10~^) models. The distri- 
bution is bimodal, or "hub-like". Shown are 1000 numerical 
samples for N — 100 and Keq — 1.0. There are some large 
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timescales that are much faster than the folding time. 
This "paradox" arises because the eigenvalue (A3) is a 
measure of the ensemble dynamics of the system, while 
MFPTs measure dynamics at the level of a single protein 
visiting specific states. Imagine the following two cases. 
If we monitor a single protein molecule as it folds, it will 
visit many unfolded states before folding. However, the 
chances that it reaches one particular unfolded state be- 
fore folding is very small - it is much more likely to visit 
the native state before this single unfolded state. Quan- 
titatively, this results in a smaller MFPT ior U F 
transitions than U ^ U transitions (Fig. [6|. 

Next, consider an ensemble of proteins. Because they 
each visit a large number of unfolded states before reach- 
ing the single native state, they are able to spread out 
quickly, equilibrating all of the unfolded states, before 
these unfolded states have a chance to equilibrate with 
the native state. This results in a system that exhibits 
slow interconversions with rapid equilibration. 

This phenomenon is purely a result of dividing the un- 
folded state into many parts - in effect, increasing the res- 
olution of non-native dynamics. Our model, being phe- 
nomenological, cannot address whether or not significant 
energetic or enthalpic barriers exist in the non-native en- 
semble. This is an important outstanding question that 
will require further work to address. It is important to 
note that, at high spatial resolution, hub-like kinetics 
might be present purely due to the size of the unfolded 
state space, regardless of if such barriers exist or not. A 
corollary of this is that at low resolution, this hub-like 
behavior will disappear - in the limit of two states, it is 
by definition impossible to have any kind of network hub. 

This interpretation is consistent not only with tradi- 



tional views of rapid unfolded-state equilibration, but 
also recent reports of relatively slow interconversions be- 
tween non-native conformations [30l [45l [471 HH!- This 
model provides a lens for reconciling these views. 



CONCLUSIONS 

What are the minimal sufficient features for a fold- 
ing sequence to exhibit two-state kinetics and thermody- 
namics? Is two-state folding biologically advantageous, 
or a physical requirement? The model presented here 
suggests that simply a large enough energy gap between 
folded and unfolded states is enough to result in two-state 
behavior. 

Further, this model 

• Explains why two-state systems are common in 
small proteins, but additional timescales appear in 
larger proteins. The model attributes this to the 
nature of protein thermodynamics, and does not 
invoke an aggregation-based evolutionary hypothe- 
sis. 

• Shows why additional fast timescales, usually not 
directly involved in folding, can be observed in tra- 
ditionally two-state systems such as villin. 

• Displays two-state kinetics without making refer- 
ence to or assuming an activated process. Agnosti- 
cism in this regard allows us to analyze non-native 
dynamics in ways that models that begin with an 
activated process cannot. 

• Reconciles the hub-like kinetics observed in simula- 
tion and with two-state kinetics, through the con- 
cept of rapid equilibration with slow interconver- 
sions. 

• Proposes a new interpretation of multi-exponential 
kinetics in fast folding proteins, and this mech- 
anism is seen to be in agreement with reported 
experiments. The model makes clear predictions 
about how folding times change with respect to 

Keg. 

• Introduced an approach based on maximizing the 
entropy of a dynamical propagator given known in- 
formation as a way of probing protein folding the- 
oretically. More sophisticated models, that include 
more detailed structural information and precise 
state energy structures, may yield additional ex- 
perimental predictions. 

Each of these items provides either new insight into em- 
pirical observations made in experiments or simulations 
that were previously poorly understood, or pushes the 
methods employed currently in the construction of ana- 
lytical theories of protein folding. 
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This model brings into focus two significant research 
questions that remain unresolved. First, this model says 
little about what the topologies of realistic propagators 
of protein dynamics look like, and to what extent those 
topologies dictate protein dynamics. Second, we have 
so far been unable to address the nature of dynamics 
in the unfolded state. Whether or not these non-native 
dynamics are restricted by significant barriers remains an 
open question, one that requires a microscopic theory (as 
opposed to the phenomenological theory presented here) 
validated careful experimentation and simulation to fully 
understand. 



SUPPLEMENTAL INFORMATION 

Supplementary information provides mathematical de- 
tail. It includes the Lagrange multiplier-based solution 
of the maximum entropy propagator T, an analysis of 
the eigenvalues and eigenvectors of that propagator, a 
perturbation-theoretic approach to the stability of that 
eigenstructure, and a calculation of the timescale gap be- 
tween A2 and A3. The results of these calculations were 
presented in the main paper. 
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